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Abstract. SHA-1 is a widely used 1995 NIST cryptographic hash function standard that was 
officially deprecated by NIST in 2011 due to fundamental security weaknesses demonstrated 
in various analyses and theoretical attacks. 

Despite its deprecation, SHA-1 remains widely used in 2017 for document and TLS certificate 
signatures, and also in many software such as the GIT versioning system for integrity and 
backup purposes. 

A key reason behind the reluctance of many industry players to replace SHA-1 with a safer 
alternative is the fact that finding an actual collision has seemed to be impractical for the 
past eleven years due to the high complexity and computational cost of the attack. 

In this paper, we demonstrate that SHA-1 collision attacks have finally become practical 
by providing the first known instance of a collision. Furthermore, the prefix of the colliding 
messages was carefully chosen so that they allow an attacker to forge two PDF documents 
with the same SHA-1 hash yet that display arbitrarily-chosen distinct visual contents. 

We were able to find this collision by combining many special cryptanalytic techniques in 
complex ways and improving upon previous work. In total the computational effort spent is 
equivalent to 2^! SHA-1 compressions and took approximately 6500 CPU years and 100 
GPU years. As a result while the computational power spent on this collision is larger than 
other public cryptanalytic computations, it is still more than 100000 times faster than a 
brute force search. 
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1 Introduction 


A cryptographic hash function H : {0,1}* > {0,1}”" is a function that computes for 
any arbitrarily long message M a fixed-length hash value of n bits. It is a versatile 
cryptographic primitive used in many applications including digital signature schemes, 
message authentication codes, password hashing and content-addressable storage. The 
security or even the proper functioning of many of these applications rely on the assumption 
that it is practically impossible to find collisions. A collision being two distinct messages z, 
y that hash to the same value H(x) = H(y). A brute-force search for collisions based on 
the so-called birthday paradox has a well understood cost of /n/2 . 2"/2 expected calls to 
the hash function. 

The MD-SHA family of hash functions is the most well-known hash function family, 
which includes MD5, SHA-1 and SHA-2 that all have found widespread use. This family 
originally started with MD4 [30] in 1990, which was quickly replaced by MD5 [31] in 1992 
due to serious security weaknesses [7, 9]. Despite early known weaknesses of its underlying 
compression function [8], MD5 was widely deployed by the software industry for over a 
decade. A project MD5CRK that attempted to find a collision by brute force was halted 
early in 2004, when a team of researchers led by Xiaoyun Wang [43] demonstrated collisions 
for MD5 found by a groundbreaking special cryptanalytic attack that pioneered new 
techniques. In a major development, Stevens et al. [38] later showed that a more powerful 
type of attack (the so-called chosen-prefiz collision attack) could be performed against 
MD5. This eventually led to the forgery of a Rogue Certification Authority that in principle 
completely undermined HTTPS security [39] in 2008. Despite this, even in 2017 there are 
still issues in deprecating MD5 for signatures [16]. 


Currently, the industry is facing a similar challenge in the deprecation of SHA-1, a 1995 
NIST standard [27]. It is one of the main hash functions of today, and it also has been facing 
important attacks since 2005. Based on previous successful cryptanalysis works [4, 2, 3] 
on SHA-0 [26] (SHA-1’s predecessor, that only differs by a single rotation in the message 
expansion function), a team led again by Wang et al. [42] presented in 2005 the very first 
theoretical collision attack on SHA-1 that is faster than brute-force. This attack, while 
groundbreaking, was purely theoretical as its expected cost of 2° calls to SHA-1 was 
practically out-of-reach. 


Therefore, as a proof of concept, many teams worked on generating collisions for 
reduced versions of SHA-1: 64 steps [6] (with a cost of 235 SHA-1 calls), 70 steps [5] (cost 
244 SHA-1), 73 steps [13] (cost 2507 SHA-1) and finally 75 steps [14] (cost 2°" SHA-1) 
using extensive GPU computation power. 


In 2013, building on these advances and a novel rigorous framework for analyzing 
SHA-1, the current best collision attack on full SHA-1 was presented by Stevens [36] with 
an estimated cost of 26t calls to the SHA-1 compression function. Nevertheless, a publicly 
known collision still remained out of reach. This was also highlighted by Schneier [32] in 
2012, when he estimated the cost of a SHA-1 collision attack to be around US$ 700K in 
2015, down to about US$ 173K in 2018 (using calculations by Walker based on a 2°! attack 
cost [36], Amazon EC2 spot prices and Moore’s Law), which he deemed to be within the 
resources of criminals. 


More recently, a collision for the full compression function underlying SHA-1 was 
obtained by Stevens et al. [37] using a start-from-the-middle approach and a highly efficient 
GPU framework (first used to mount a similar freestart attack on the function reduced 
to 76 steps [18]). This required only a reasonable amount of GPU computation power, 
about 10 days using 64 GPUs, equivalent to approximately 257-5 calls to SHA-1 on GPU. 
Based on this attack, the authors projected that a collision attack on SHA-1 may cost 
between US$ 75K and US$ 120K by renting GPU computing time on Amazon EC2 [33] 
using spot-instances, which is significantly lower than Schneier’s 2012 estimates. These new 
projections had almost immediate effect when CABForum Ballot 152 to extend issuance 
of SHA-1 based HTTPS certificates was withdrawn [11], and SHA-1 was deprecated for 
digital signatures in the IETF’s TLS protocol specification version 1.3. 


Unfortunately CABForum restrictions on the use of SHA-1 only apply to actively 
enrolled Certification Authority certificates and not on any other certificates. E.g., retracted 
CA certificates that are still supported by older systems (and CA certificates have indeed 
been retracted for continued use of SHA-1 certificates to serve to these older systems 
unchecked by CABForum regulations'), and certificates for other TLS applications including 
up to 10% of credit card payment systems [40]. It thus remains in widespread use across 
the software industry for, e.g., digital signatures on software, documents, and many other 
applications, most notably in the GIT versioning system. 


Worth noting is that not only academic efforts have been spent on breaking hash 
functions. Nation-state actors [28, 22, 21] have been linked to the highly advanced espionage 
malware “Flame” that was found targeting the Middle-East in May 2012. As it turned out, 
it used a forged signature to infect Windows machines via a man-in-the-middle attack on 
Windows Update. Using a new technique of counter-cryptanalysis that is able to expose 
cryptanalytic collision attacks given only one message from a colliding message pair, it was 
proven that the forged signature was made possible by a then secret chosen-prefix attack 
on MD5 [35, 10]. 


Table 1: Colliding message blocks for SHA-1. 
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2 Our contributions 


We are the first to exhibit an example collision for SHA-1, presented in Table 1, thereby 
proving that theoretical attacks on SHA-1 have now become practical. Our work builds 
upon the best known theoretical collision attack [36] with estimated cost of 2°' SHA-1 
calls. This is an identical-prefix collision attack, where a given prefix P is extended with 
two distinct near-collision block pairs such that they collide for any suffix S: 


SHA-1 (PIM? mIs) = SHA-1 (PIM, | ||3) 


The computational effort spent on our attack is estimated to be equivalent to 2°! 


SHA-1 calls (see Section 6). There is certainly a gap between the theoretical attack as 
presented in [36] and our executed practical attack that was based on it. Indeed, the 
theoretical attack’s estimated complexity does not include the inherent relative loss in 
efficiency when using GPUs, nor the inefficiency we encountered in actually launching a 
large scale computation distributed over several data centers. Moreover, the construction of 
the second near-collision attack was significantly more complicated than could be expected 
from the literature. 

To find the first near-collision block pair (MM), Mm) we employed the open-source 
code from [36], which was modified to work with our prefix P given in Table 2 and for 
large scale distribution over several data centers. To find the second near-collision block 
pair (MX), M£?) that finishes the collision was significantly harder, as the attack cost is 
known to be significantly higher, but also because of additional obstacles. 


1 E.g., SHA-1 certificates are still being sold by CloudFlare at the time of writing: https: //www.cloudflare. 
com/ssl/dedicated-certificates/ 


Table 2: Identical prefix of our collision. 
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In Section 5 we will discuss in particular the process of building the second near-collision 
attack. Essentially we followed the same steps as was done for the first near-collision attack 
[36], combining many existing cryptanalytic techniques. Yet we further employed the SHA-1 
collision search GPU framework from Karpman et al. [18] to achieve a significantly more 
cost efficient attack. 

We also describe two new additional techniques used in the construction of the second 
near-collision attack. The first allowed us to use additional differential paths around step 
23 for increased success probability and more degrees of freedom without compromising the 
use of an early-stop technique. The second was necessary to overcome a serious problem of 
an unsolvable strongly over-defined system of equations over the first few steps of SHA-1’s 
compression function that threatened the feasibility of finishing this project. 

Our example colliding files only differ in two successive random-looking message 
blocks generated by our attack. We exploit these limited differences to craft two colliding 
PDF documents containing arbitrary distinct images. Examples can be downloaded from 
https://shattered.io; another smaller example is given in Section B.1. PDFs with the 
same MD5 hash have previously been constructed by Gebhardt et al. [12] by exploiting 
so-called Indexed Color Tables and Color Transformation functions. However, this method 
is not effective for many common PDF viewers that lack support for these functionalities. 
Our PDF's rely on distinct parsings of JPEG images, similar to Gebhardt et al.’s TIFF 
technique [12] and Albertini et al.’s JPEG technique [1]. Yet we improved upon these basic 
techniques using very low-level “wizard” JPEG features such that these work in all common 
PDF viewers, and even allow very large JPEGs that can be used to craft multi-page PDFs. 

Some details of our work will be made public later only when sufficient time to implement 
additional security measures has passed. This includes our improved JPEG technique and 
the source-code for our attack and cryptanalytic tools. 

The remainder of this paper is organized as follows. We first give a brief description 
of SHA-1 in Section 3. Then in Section 4 we give a high-level overview of our attack, 
followed by Section 5 that details the entire process and the cryptanalytic techniques 
employed, where we also highlight improvements with respect to previous work. Finally, 
we discuss in Section 6 the large-scale distributed computations required to find the two 
near-collision block pairs. The parameters used to find the second colliding block are given 
in the appendix, in Section A. 


3 The SHA-1 hash function 


We provide a brief description of SHA-1 as defined by NIST [27]. SHA-1 takes an arbitrary- 
length message and computes a 160-bit hash. It divides the (padded) input message into k 


blocks Mı, ..., Mp of 512 bits. The 160-bit internal state CV; of SHA-1, called the chaining 
value, is initialized to a predefined initial value CVo = IV. Each block M; is fed to a 
compression function h that updates the chaining value, t.e., CVji1 = h(CV;, Mj+1), where 
the final CV; is output as the hash. 

The compression function h given a 160-bit chaining value CV; and a 512-bit message 
block Mj, as inputs will output a new 160-bit chaining value CV;,;. It mixes the message 
block into the chaining value as follows, operating on words, simultaneously seen as 32-bit 
strings and as elements of Z/2°Z. The input chaining value is parsed as 5 words a, b,c, d,e, 
and the message block as 16 words mo,...,7m15. The latter are expanded into 80 words 
using the following recursive linear equation: 


Mi = (mi 3 D Mi-8 ® Mi 14® mi-16) O}, for 16 <i < 80. 


Starting from (A-4, A-3, A-2, A-1, Ag) := (€97,dO?, cO?, b,a), each m; is mixed into an 
intermediate state over 80 steps 7 =0,...,79: 


5 2 2 
Ai+1 = a + pArA A ) + A +Ki+ Mi, 


where y; and K; are predefined Boolean functions and constants: 


step a ~i(Z,Y, z) Ki 
0<i<20 yip = (4 Ay) V (ara z) 0x5a827999 
20 < i < 40 PxoR =LOYSzZ Ox6ed9ebal 
40<i<60 yay=(@AY)V(LZAZ)V(YAZzZ) Ox8f£1bbcde 
60 <z< 80 PxoR =TOYOzZ Oxca62c1d6 


After the 80 steps, the new chaining value is computed as the sum of the input chaining 
value and the final intermediate state: 


CVj41 = (a + Ago, b+ A7g,c+ AQ”, d+ AS eel, ji 


4 Overview of our SHA-1 collision attack 


We illustrate our attack from a high level in Figure 1, where differences between the two 
computations of the two colliding messages are depicted by red shading. Starting from 
identical chaining values for two messages, we use two pairs of blocks. The differences in the 
first block pair cause a small difference in the output chaining value, which is canceled by 
the difference in the second block pair, leading again to identical chaining values and hence 
a collision (indicated by (2)). We employ differential paths that are a precise description of 
differences in state words and message words and of how these differences should propagate 
through the 80 steps. 

Note that although the first five state words are fixed by the chaining value, one can 
freely modify message words and thus directly influence the next sixteen state words. 
Moreover, with additional effort this can be extended to obtain limited influence over 
another eight state words. However, control over the remaining state words (indicated by (1)) 
is very hard and thus requires very sparse target differences that correctly propagate with 
probability as high as possible. Furthermore, these need to be compatible with differences 
in the expanded message words. The key solution is the concept of local collisions [4], 
where any state bit difference introduced by a perturbation message bit difference is to be 
canceled in the next five steps using correction message bit differences. 

To ensure all message word bit differences are compatible with the linear message 
expansion, one uses a Disturbance Vector (DV) [4] that is a correctly expanded message 
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Fig. 1: Attack overview 


itself, but where every “1” bit marks the start of a local collision. The selection of a good 
disturbance vector is very important for the overall attack cost. As previously shown by 
Wang et al. [42], the main reason of using two block pairs instead of just one pair is that 
this choice alleviates an important restriction on the disturbance vector, namely that there 
are no state differences after the last step. Similarly, it may be impossible to unite the 
input chaining value difference with the local collisions for an arbitrary disturbance vector. 
This was solved by Wang et al. [42] by crafting a tailored differential path (called the 
Non-Linear (NL) path, indicated by (3)) that over the first 16 steps connects the input 
chaining value differences to the local collision differences over the remaining steps (called 
the Linear path, referring to the linear message expansion dictating the local collision 
positions). 

One has to choose a good disturbance vector, then for each near-collision attack craft 
a non-linear differential path, determine a system of equations over all steps and finally 
find a solution in the form of a near-collision message block pair (as indicated by (4A) and 
(4B)). Note that one can only craft the non-linear path for the second near-collision attack 
once the chaining values resulting from the first block pair are known. This entire process 
including our improvements is described below. 


5 Near-collision attack procedure 
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Fig. 2: The main steps for each near-collision attack. 


This section describes the overall procedure of each of the two near-collision attacks. 
Since we relied on our modification of Stevens’ public source-code [36, 15] for the first 
near-collision attack, we focus on our extended procedure for our second near-collision 
attack. As shown in Figure 2, this involves the following steps that are further detailed 
below: 


1. selection of the disturbance vector (same for both attacks); 
2. construction of the non-linear differential path; 


. determine attack conditions over all steps; 

. find additional conditions beyond fixed diff. path for early-stop; 

if necessary fix solvability of attack conditions over first few steps; 
find message modification rules to speed-up collision search; 

write the attack algorithm; 

. finally, run the attack to find a near-collision block pair. 


DONA TR w 


5.1 Disturbance Vector selection 


The selection of which Disturbance Vector to use is the most important choice as this 
directly determines many aspects of the collision attack. These include the message XOR 
differences, but also in theory the optimal attack choices over the linear path, including 
the optimal set of candidate endings for the non-linear path together with optimal linear 
message bit equations that maximize the success probability over the linear part. 

Historically several approaches have been used to analyze a disturbance vector to 
estimate attack costs over the linear part. Initially, the hamming weight of the DV that 
counts the active number of local collisions was used (see e.g. [3, 29]). For the first theoretical 
attack on SHA-1 with cost 2° SHA-1-calls by Wang et al. [42] a more refined measure was 
used that counts the number of bit conditions on the state and message bits that ensure the 
differential path would be satisfied. This was later refined by Yajima et al. [45] to a more 
precise count by exploiting all possible so-called bit compressions and interactions through 
the Boolean functions. However, this approach does not allow any carry propagation 
resulting in alternate differential paths that may improve the overall success probability. 
Therefore, Mendel et al. [25] proposed to use the more accurate probability of single local 
collisions where carry propagations are allowed, in combination with known local collision 
interaction corrections. 

The current state-of-the-art is joint local-collision analysis introduced by Stevens 
[36, 34] which given sets of allowed differences for each state word A; and message word 
mi (given by the disturbance vector) computes the exact optimal success probability over 
the specified steps by exhaustively evaluating all differential paths with those allowed 
differences. This approach is very powerful as it also provides important information for the 
next steps, namely the set of optimal chaining value differences (by considering arbitrary 
high probability differences for the last five Ajs) and the set of optimal endings for the 
non-linear path together with a corresponding set of message bit equations, using which 
the optimal highest success probability of the specified steps can actually be achieved. The 
best theoretical collision attack on SHA-1 with cost 26t SHA-1 calls [36] was built using 
this analysis. As we build upon this collision attack, we use the same disturbance vector, 
named II(52,0) by Manuel [23]. 


5.2 Construction of a non-linear differential path 


Once the disturbance vector and the corresponding linear part of the differential path 
have been fixed, the next step consists in finding a suitable non-linear path connecting the 
chaining value pair (with fixed differences) to the linear part. This step needs to be done 
separately for each near-collision attack of the full collision attack.? 

As explained for instance in [36], in the case of the first near-collision attack, the 
attacker has the advantage of two additional freedoms. Firstly, an arbitrary prefix can be 
included before the start of the attack to pre-fulfill a limited number of conditions on the 
chaining value. This allows greater freedom in constructing the non-linear path as this 
does not have to be restricted to a specific value of the chaining value pair, whereas the 


2 We eventually produced two message block pair solutions for the first near-collision attack. This provided 
a small additional amount of freedom in the search for the non-linear path of the second block. 


non-linear path for the second near-collision attack has to start from the specific given 
value of input chaining value pair. Secondly, it can use the entire set of output chaining 
value differences with the same highest probability. The first near-collision attack is not 
limited to a particular value and succeeds when it finds a chaining value difference in this 
set, whereas the second near-collision attack has to cancel the specific difference in the 
resulting chaining value pair. Theory predicts the first near-collision attack to be at least a 
factor 6 faster than the second attack [36]. For our collision attack it is indeed the second 
near-collision attack that dominates the overall attack complexity. 


Historically, the first non-linear paths for SHA-1 were hand-crafted by Wang et al.. 
Several algorithms were subsequently developed to automatically search for non-linear paths 
for MD5, SHA-1, and other functions of the MD-SHA family. The first automatic search for 
SHA-1 by De Canniére and Rechberger [6] was based on a guess-and-determine approach. 
This approach tracks the allowed values of each bit pair in the two related compression 
function computations. It starts with no constraints on the values of these bit pairs other 
than the chaining value pair and the linear part differences. It then repeatedly restricts 
values on a selected bit pair and then propagates this information via the step function 
and linear message expansion relation, i.e., it determines and eliminates previously-allowed 
values for other bit pairs that are now impossible due the added restriction. Whenever a 
contradiction occurs, the algorithm backtracks and chooses a different restriction on the 
last selected bit pair. 


Another algorithm for SHA-1 was introduced by Yajima et al. [46] that is based on a 
meet-in-the-middle approach. It starts from two fully specified differential paths. The first 
is obtained from a forward expansion of the input chaining value pair, whereas the other is 
obtained from a backward expansion of the linear path. It then tries to connect these two 
differential paths over the remaining five steps in the middle by recursively iterating over 
all solutions over a particular step. 


A similar meet-in-the-middle algorithm was independently first developed for MD5 
and then adapted to SHA-1 by Stevens et al. [38, 34, 15], which operates on bit-slices and 
is more efficient. The open-source HashClash project [15] seems to be the only publicly 
available non-linear path construction implementation, which we improved as follows. 
Originally, it expanded a large set of differential paths step by step, keeping only the 
best N paths after each step, for some user-specified number N. However, there might be 
several good differential paths that result in the same differences and conditions around 
the connecting five steps, where either none or all lead to fully connected differential paths. 
Since we only need the best fully connected differential path we can find, we only need 
to keep a best differential path from each subset of paths with the same differences and 
conditions over the last five steps that were extended. So to remove this redundancy, for 
each step we extend and keep say the 4N best paths, then we remove all such superfluous 
paths, and finally keep at most N paths. This improvement led to a small but very welcome 
reduction in the amount of differential path conditions under the same path construction 
parameter choices, but also allowed a better positioning of the largest density of sufficient 
conditions for the differential path. 


Construction of a very good non-linear path for the second near-collision attack using 
our improved HashClash version took a small effort with our improvements, yet even 
allowed us to restrict the section with high density of conditions to just the first six steps. 
However, to find a very good non-linear differential path that is also solvable turned out to 
be very complicated. Our final solution is described in Section 5.5, which in the end did 
allow us to build our attack on the best non-linear path we found without any compromises. 
The fixed version of this best non-linear path is presented in Figure 3, Section A. 


5.3 Determine attack conditions 


Having selected the disturbance vector and constructed a non-linear path that bridges 
into the linear part, the next step is to determine the entire system of equations for the 
attack. This system of equations is expressed entirely over the computation of message 
M () and not over M (2) and consists of two types of equations. First, linear equations 
over message bits. These are used to control the additive signs of the message word XOR 
differences implied by the disturbance vector. Since there are many different “signings’ 
over the linear part with the same highest probability, instead of one specific choice one 
uses a linear hull that captures many choices to reduce the amount of necessary equations. 
Secondly, linear equations over state bits given by a fixed differential path up to some step 
i (that includes the non-linear path). These control whether there is a difference in a state 
bit and which sign it has, furthermore they force target differences in the outputs of the 
Boolean functions ;. 


? 


We determine this entire system by employing our implementation of joint-local collision 
analysis that has been improved as follows. JLCA takes input sets of allowed differences 
for each A; and m; and exhaustively analyzes the set of differential paths with those 
allowed differences, which originally is only used to analyze the linear part. We additionally 
provide it with specific differences for A; and m; as given by the non-linear path, so we 
can run JLCA over all 80 steps and have it output an optimal fixed differential path over 
steps 0,...,22 together with an optimal set of linear equations over message bits over the 
remaining steps. These are optimal results since JLCA guarantees these lead to the highest 
probability that is possible using the given allowed differences, but furthermore that a 
largest linear hull is used to minimize the amount of equations. 

Note that having a fixed differential path over more steps directly provides more state 
bit equations which is helpful in the actual collision search because we can apply the 
early-stop technique. However, this also adds further restrictions on A; limiting a set of 
allowed differences to a single specific difference. In our case limiting Aj4 would result, 
besides a drop in degrees of freedom, in a lower overall probability, thus we only use a 
fixed differential path up to step 22, i.e., up to A23. Below in Section 5.4 we show how we 
compensated for fewer state equations that the actual collision search uses to early stop. 


5.4 Find additional state conditions 


As explained in Section 5.3, the system of equations consists of linear equations over 
(expanded) message bits and linear equations over state bits. In the actual collision search 
algorithm we depend on these state bit equations to stop computation on a bad current 
solution as early as possible and start backtracking. These state bit equations are directly 
given by a fixed differential path, where every bit difference in the state and message is fixed. 
Starting from step 23 we allow several alternate differential paths that increase success 
probability, but also allow distinct message word differences that lead to a decrease in the 
overall number of equations. Each alternate differential path depends on its own (distinct) 
message word differences and leads to its own state bit equations. To find additional 
equations, we also consider linear equations over state and message bits around steps 21-25. 
Although in theory these could be computed by JLCA by exhaustively reconstructing all 
alternate differential paths and then determining the desired linear equations, we instead 
took a much simpler approach. We generated a large amount of random solutions of the 
system of equations up to step 31 using an unoptimized general collision search algorithm. 
We then proceeded to exhaustively test potential linear equations over at most 4 state bits 
and message bits around steps 21-25, which is quite efficient as on average only 2 samples 
needed to be checked for each bad candidate. The additional equations we found and used 
for the collision search are shown in Table 4, Section A. 


5.5 Fix solvability over first steps 


This step is not required when there are sufficient degrees of freedom in the non-linear part, 
as was the case in the first near-collision attack. As already noted, in the case of the second 
near-collision attack, the non-linear path has to start will a fully fixed chaining value and 
has significantly more conditions in the first steps. As a result the construction of a very 
good and solvable non-linear differential path for the second near-collision attack turned out 
to be quite complex. Our initially constructed paths unfortunately proved to be unsolvable 
over the first few steps. We tried several approaches including using the guess-and-determine 
non-linear path construction to make corrections as done by Karpman et al. [18], as well 
as using worse differential path construction parameters, but all these attempts led to 
results that not only were unsatisfactory but that even threatened the feasibility of the 
second near-collision attack. Specifically, both using the guess-and-determine approach 
as well as the meet-in-the-middle approach with a later connecting step, the resulting 
differential paths had significantly increased number of conditions bringing the total number 
of degrees of freedom critically low. Moreover, the additional conditions easily conflicted 
with candidate speed-up measures named “boomerangs” necessary to bring the attack’s 
complexity down to a feasible level. Our final solution was to encode this problem into a 
satisfiability (SAT) problem and use a SAT solver to find a drop-in replacement differential 
path over the first eight steps that is solvable. 

More specifically, we adapted the SHA-1 SAT system generator from Nossum? to 
generate two independent 8-step compression function computations, which we then linked 
by adding constraints that set the given input chaining value pair, the message XOR 
differences over mo,...,7, the path differences of A4,..., Ag and the path conditions 
of As,...,Ag. In effect, we allowed complete freedom over A1, A2, A3 and some freedom 
over A4. All solutions were exhaustively generated by MiniSAT* and then converted into 
drop-in replacement paths, from which we kept the one with fewest conditions. 

This allowed us to build our attack on the best non-linear path we found without any 
compromises and the corrected non-linear path is presented in Figure 3, Section A. Note 
that indeed the system of equations is over-defined: over the first five steps, there are only 
15 state bits without an equation, while at the same time there are 23 message equations. 


5.6 Find message modifications to speed-up collision search 


To speed-up the collision search significantly it is important to employ message modification 
rules that make small changes in the current message block that do not affect any bit 
involved with the state and message bit equations up to some step n (with sufficiently high 
probability). This effectively allows such a message modification rule to be applied to one 
solution up to step n to generate several solutions up to the same step with almost no 
additional cost, thereby significantly reducing the average cost to generate solutions up to 
step n. 

The first such speed-up technique that was developed in attacks of the MD-SHA family 
was called neutral bits, introduced by Biham and Chen to improve attacks on SHA-O [2]. A 
message bit is neutral up to a step n if flipping this bit causes changes that do not interact 
with differential path conditions up to step n with high probability. As the diffusion of 
SHA-0/SHA-1’s step function is rather slow, it is not hard to find many bits that are 
neutral for a few steps. 

A nice improvement of the original neutral bits technique was ultimately described by 
Joux and Peyrin as “boomerangs” [17]. It consists in carefully selecting a few bits that 
are all flipped together in such a way that this effectively flips say only one state bit in 


3 https: //github.com/vegard/sha1-sat 
4 http: //minisat.se/ 
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the first 16 steps, and such that the diffusion of uncontrollable changes is significantly 
delayed. This idea can be instantiated efficiently by flipping together bits that form a local 
collision for the step function. This local collision will eventually introduce uncontrollable 
differences through the message expansion. However, these do not appear immediately, 
and if all conditions for the local collision to be successful are verified, the first few steps 
after the introduction of its initial perturbation will be free of any difference. Joux and 
Peyrin then noted that sufficient conditions for the local collision can be pre-satisfied when 
creating the initial partial solution, effectively leading to probability-one local collisions. 
This leads to a few powerful message modification rules that are neutral up to very late 
steps. 

A closely related variant of boomerangs is named advanced message modification by 
Wang et al. in their attack of the MD-SHA family (see e.g. [42]). While the objective of 
this technique is also to exploit the available freedom in the message, it applies this in a 
distinct way by identifying ways of interacting with an isolated differential path condition 
with high probability. Then, if an initial message pair fails to verify a condition for which a 
message modification exists, the bits of the latter are flipped, so that the resulting message 
pair now verifies the condition with high probability. 

In our attack, we used both neutral bits and boomerangs as message modification 
rules. This choice was particularly motivated by the ability to efficiently implement these 
speed-up techniques on GPUs, used to compute the second block of the collision, similar 
to [18, 37]. 

Our search process for finding the neutral bits follows the one described in [37]. Potential 
boomerangs are selected first, one being eligible if its initial perturbation does not interact 
with differential path conditions and if the corrections of the local collision do not break 
some linear message bit relation (this would typically happen if an odd number of bits to 
be flipped are part of such a relation). The probability with which a boomerang eventually 
interacts with path conditions is then evaluated experimentally by activating it on about 
4000 independent partial solutions; the probability threshold used to determine up to 
which step a boomerang can be used is set to 0.9, meaning that it can be used to generate 
an additional partial solution at step n from an existing one if it does not interact with 
path conditions up to step n with probability more than 0.1. Once boomerangs have 
been selected, the sufficient conditions necessary to ensure that their corresponding local 
collisions occur with probability one are added to the differential path, and all remaining 
free message bits are tested for neutrality using the same process (7.e., a bit is only eligible 
if flipping it does not trivially violate path conditions or make it impossible to later satisfy 
message bit relations, and its quality is evaluated experimentally). 

The list of neutral bits and boomerangs used for the second block of the attack is given 
in Section A. There are 51 neutral bits, located on message words mıı to m15, and three 
boomerangs each made of a single local collision started on mos (for two of them) or mog. 


5.7 Attack implementation 


A final step in the design of the attack is to implement it. This is needed for obvious 
reasons if the goal is to find an actual collision as we do here, but it is also a necessary 
step if one wishes to obtain a precise estimate of the complexity of the attack. Indeed, 
while the complexity of the probabilistic phase of the attack can be accurately computed 
using JLCA (or can also be experimentally determined by sampling many mock partial 
solutions), there is much more uncertainty as to “where” this phase actually starts. In 
other words, it is hard to exactly predict how effective the speed-up techniques can be 
without actually implementing them. The only way to determine the real complexity of an 
attack is then to implement it, measure the rate of production of partial solutions up to a 
step where there is no difference in the differential path for five consecutive state words, 
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and use JLCA to compute the exact probability of obtaining a (near-)collision over the 
remaining steps. 


The first near-collision block pair of the attack was computed with CPUs, using an 
adapted version of the HashClash software [15]. As the original code was not suitable to run 
on a large scale, a significant effort was spent to make it efficient on the hundreds of cores 
necessary to obtain a near-collision in reasonable time. The more expensive computation of 
the second block was done on GPUs, based on the framework used by Karpman et al. [18], 
which we briefly describe below. 


The main structure used in this framework consists in first generating base solutions 
on CPUs that fix the sixteen free message words, and then to use GPUs to extend these 
to partial solutions up to a late step, by only exploiting the freedom offered by speed-up 
techniques (in particular neutral bits and boomerangs). These partial solutions are then 
sent back to a CPU to check if they result in collisions. 


The main technical difficulty of this approach is to make the best use of the power 
offered by GPUs. Notably, their programming model differs from the one of CPUs in 
how diverse the computations run on their many available cores can be: on a multicore 
CPU, every core can be used to run an independent process; however, even if a recent 
GPU can feature many more cores than a CPU (for instance, the Nvidia GTX 970 used 
in [18, 37] and the initial implementation of this attack features 1664 cores), they can 
only be programmed at the granularity of warps, made of 32 threads which must then 
run the same code. Furthermore, divergence in the control flow of threads of a single warp 
is dealt with by serializing the diverging computations; for instance, if a single thread 
takes a different branch than the rest of the warp in an if statement, all the other threads 
become idle while it is taking its own branch. This limitation would make a naive parallel 
implementation of the usage of neutral bits rather inefficient, and there is instead a strong 
incentive to minimize control-flow divergence when implementing the attack. 


The approach taken by [18] to limit the impact of the inherent divergence in neutral 
bit usage is to decompose the attack process step by step and to use the fair amount of 
memory available on recent GPUs to store partial solutions up to many different steps 
in shared buffers. In a nutshell, all threads of a single warp are asked to load their own 
partial solution up to a certain state word A;, and they will together apply all neutral 
bits available at this step, each time checking if the solution can be validly extended to a 
solution up to Aj.1; if and only if this is the case, this solution is stored in the buffer for 
partial solutions up to A;+1, and this selective writing operation is the only moment where 
the control flow of the warps may diverge. 


To compute the second block pair of the attack, and hence obtain a full collision, we first 
generated base solutions consisting of partial solutions up to A14 on CPU, and used GPUs 
to generate additional partial solutions up to Agg. These were further probabilistically 
extended to partial solutions up to As3, still using GPUs, and checking whether they 
resulted in a collision was finally done on a CPU. The probability of such a partial solution 
to also lead to a collision can be computed by JLCA to be equal to 2-27°, and 2748-7 
for partial solutions up to A33 (these probabilities could in fact both be reduced by a 
factor 2°-°; however, the one indicated here correspond to the attack we carried out). On a 
GTX 970, a prototype implementation of the attack produced partial solutions up to A33 
at a rate of approximately 58100 per second, while the full SHA-1 compression function 
can be evaluated about 231:8 times per second on the same GPU. Thus, our attack has an 
expected complexity of 2°*7 on this platform. 


Finally, adapting the prototype GPU implementation to a large-scale infrastructure 
suitable to run such an expensive computation also required a fair amount of work. 
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6 Computation of the collision 


This section gives some details about the computation of the collision and provides a few 
comparisons with notable cryptographic computations. 


6.1 Units of complexity 


The complexity figures given in this section follow the common practice in the cryptanalysis 
of symmetric schemes of comparing the efficiency of an attack to the cost of using a generic 
algorithm achieving the same result. This can be made by comparing the time needed, with 
the same resources, to e.g. compute a collision on a hash function by using a (memoryless) 
generic collision search versus by using a dedicated process. This comparison is usually 
expressed by dividing the time taken by the attack, e.g. in core hours, by the time taken 
to compute the attacked primitive once on the same platform; the cost of using a generic 
algorithm is then left implicit. This is for instance how the figure of 2647 from Section 5.7 
has been derived. 

While this approach is reasonable, it is far from being as precise as what a number 
such as 2647 seems to imply. We discuss below a few of its limitations. 


The impact of code optimization. An experimental evaluation of the complexity of 
an attack is bound to be sensitive to the quality of the implementation, both of the attack 
itself and of the reference primitive used as a comparison. A hash function such as SHA-1 
is easy to implement relatively efficiently, and the difference in performance between a 
reference and optimized implementation is likely to be small. This may however not be 
true for the implementation of an attack, which may have a more complex structure. 
A better implementation may then decrease the “complexity” of an attack without any 
cryptanalytical improvements. 

Although we implemented our attack in the best way we could, one cannot exclude that 
a different approach or some modest further optimizations may lead to an improvement. 
However, barring a radical redesign, the associated gain should not be significant; the 
improvements brought by some of our own low-level optimizations was typically of about 
15%. 


The impact of the attack platform. The choice of the platform used to run the 
attack may have a more significant impact on its evaluated complexity. While a CPU is by 
definition suitable to run general-purpose computations, this is not the case of e.g. GPUs. 
Thus, the gap between how fast a simple computation, such as evaluating the compression 
function of SHA-1, and a more complex one, such as our attack, need not be the same on 
the two kinds of architectures. For instance, the authors of [18] noticed that their 76-step 
freestart attack could be implemented on CPU (a 3.2 GHz Haswell Core i5) for a cost 
equivalent to 249! compression function computations, while this increased to 2°°-?° on 
their best-performing GTX 970, and 250-34 on average. 

This difference leads to a slight paradox: from an attacker’s point of view, it may 
seem best to implement the attack on a CPU in order to be able to claim a better attack 
complexity. However, a GPU being far more powerful, it is actually much more efficient 
to run it on the latter: the attack of [18] takes only a bit more than four days to run on 
a single GTX 970, which is much less than the estimated 150 days it would take using a 
single quad-core CPU. 

We did not write a CPU (resp. GPU) implementation of our own attack for the search 
of the second (resp. first) block, and are thus unable to make a similar comparison for the 
present full hash function attack. However, as we used the same framework as [18], it is 
reasonable to assume that the gap would be of the same order. 
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How to pick the best generic attack. As we pointed out above, the common method- 
ology for measuring the complexity of an attack leaves implicit the comparison with a 
generic approach. This may introduce a bias in suggesting a strategy for a generic attacker 
that is in fact not optimal. This was already hinted in the previous paragraph, where we 
remarked that an attack may seem to become worse when implemented on a more efficient 
platform. In fact, the underlying assumption that a generic attacker would use the same 
platform as the one on which the cryptanalytic attack is implemented may not always be 
justified: for instance, even if the latter is run on a CPU, there is no particular reason why a 
generic attacker would not use more energy-efficient GPUs or FPGAs. It may thus be hard 
to precisely estimate the absolute gain provided by a cryptanalytic attack compared to the 
best implementation of a generic algorithm with identical monetary and time resources, 
especially when these are high. 

The issues raised here could all be addressed in principle by carefully implementing, say 
van Oorschot and Wiener’s parallel collision search on a cluster of efficient platforms [41]. 
However, this is usually not done in practice, and we made no exception in our case. 

Despite the few shortcomings of this usual methodology used to evaluate the complexity 
of attacks, it remains in our opinion a reliable measure thereof, that allows to compare 
different attack efforts reasonably well. For want of a better one, it is also the approach 
used in this paper. 


6.2 The computation 


The major challenge when running our near-collision attacks distributed across the world 
was to adapt the attacks into a distributed computation model which pursues two goals: 
the geographically distributed workers should work independently without duplication of 
work, the number of the wasted computational time due to worker’s failures should be 
minimized. The first goal required storage with the ability endure high loads of requests 
coming from all around the globe. For the second goal, the main sources of failures we found 
were: preemption by higher-priority workers and bugs in GPU hardware. To diminish the 
impact of these failures, we learned to predict failures in the early stages of computation 
and terminated workers without wasting significant amounts of computational time. 


First near-collision attack. The first phase of the attack, corresponding to the generation 
of first-block near collisions, was run on a heterogeneous CPU cluster hosted by Google, 
spread over 8 physical locations. The computation was split into small jobs of expected 
running time of one hour, whose objectives were to compute partial solutions up to step 
61. The running time of one hour proved to be the best choice to be resilient against 
various kind of failures (mostly machine failure, preemption by other users of the cluster, 
or network issues), while limiting the overhead of managing many jobs. A MapReduce 
paradigm was used to collect the solutions of a series of smaller jobs; in hindsight, this was 
not the best approach, as it introduced an unnecessary bottleneck in the reduce phase. 
The first first-block near collision was found after spending about 3583 core years that 
had produced 180711 partial solutions up to step 61. A second near collision block was 
then later computed; it required an additional 2987 core years and 148 975 partial solutions. 
There was a variety of CPUs involved in this computation, but it is reasonable to assume 
that they all were roughly equivalent in performance. On a single core of a 2.3GHz Xeon 
E5-2650v3, the OpenSSL implementation of SHA-1 can compute up to 27°? compression 
functions per second. Taking this as a unit, the first near-collision block required an effort 


equivalent to 26 SHA-1 compression function calls, and the second first block required 
959.75. 
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Second near-collision attack. The second more expensive phase of the attack was 
run on a heterogeneous cluster of K20, K40 and K80 GPUs, also hosted by Google. It 
corresponded to the generation of a second-block near-collision leading to a full collision. 

The overall setup of the computation was similar to the one of the first block, except 
that it did not use a MapReduce approach and resorted to using simpler queues holding 
the unprocessed jobs. A worker would then select a job, potentially produce one or several 
partial solutions up to step 61, and die on completion. 

The collision was found after 369985 partial solutions had been produced®. The 
production rates of partial 61-step solutions of the different devices used in the cluster 
were of 0.593 per hour for the K80 (which combines two GPU chips on one card), 0.444 for 
the K40 and 0.368 for the K20. The time needed for a homogeneous cluster to produce the 
collision would then have been of 114 K20-years, 95 K40-years or 71 K80-years. 

The rate at which these various devices can compute the compression function of SHA-1 
is, according to our measurements, 231-1 s71 for the K20, 2313 s7! for the K40, and 2°! s71 
for the K80 (230 s-! per GPU). The effort of finding the second block of the collision for 
homogeneous clusters, measured in number of equivalent calls to the compression function, 
is thus equal to 262-8 for the K20 and K40 and 2°?! for the K80. 

Although a GTX 970 was only used to prototype the attack, we can also consider its 
projected efficiency and measure the effort spent for the attack w.r.t. this GPU. From the 
measured production rate of 58100 step 33 solutions per second, we can deduce that 0.415 
step 61 solutions can be computed per hour on average. This leads to a computational 
effort of 102 GPU year, equivalent to 2834 SHA-1 compression function calls. 

The monetary cost of computing the second block of the attack by renting Amazon 
instances can be estimated from these various data. Using a p2. 16xlarge instance, featuring 
16 K80 GPUs and nominally costing US$14.4 per hour would cost US$560K for the 
necessary 71 device years. It would be more economical for a patient attacker to wait for 
low “spot prices” of the smaller g2.8xlarge instances, which feature four K520 GPUs, 
roughly equivalent to a K40 or a GTX 970. Assuming thusly an effort of 100 device years, 
and a typical spot price of US$0.5 per hour, the overall cost would be of US$110K. 

Finally, summing the cost of each phase of the attack in terms of compression function 
calls, we obtain a total effort of 2651, including the redundant second near-colliding first 
block and taking the figure of 2628 for the second block collision. This should however not 
be taken as an absolute number; depending on luck and equipment but without changing 
any of the cryptanalytical aspects of our attack, it is conceivable that the spent effort could 
have been anywhere from, say, 2°? to 2651 equivalent compression function calls. 


6.3 Complexity comparisons 


We put our own result into perspective by briefly comparing its complexity to a few other 
relevant cryptographic computations. 


Comparison with MD5 and SHA-0 collisions. An apt comparison is first to consider 
the cost of computing collisions for MD5 [31], a once very popular hash function, and 
SHA-O [26], identical to SHA-1 but for a missing rotation in the message expansion. The 
best-known identical-prefix collision attacks for these three functions are all based on the 
same series of work from Wang et al. from the mid-2000s [43, 44, 42], but have widely 
varying complexities. 

The best current identical-prefix collision attacks on MD5 are due to Stevens et al., 
and require the equivalent of about 216 compression function calls [39]. Furthermore, in 
the same paper, chosen-prefix collisions are computed for a cost equivalent to about 2°9 


5 We were quite lucky in that respect. The expected number required is about 2.5 times more than that. 
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calls, increasing to 2% calls for a three-block chosen-prefix collision as was generated on 


200 PS3s for the rogue Certification Authority work. 

Though very similar to SHA-1, SHA-0 is much weaker against collision attacks. The 
best current such attack on SHA-O is due to Manuel and Peyrin [24], and requires the 
equivalent of about 23-6 calls to the compression function. 

Identical-prefix collisions for MD5 and SHA-O can thus be obtained within a reasonable 
time by using very limited computational power, such as a decent smartphone. 


Comparison with RSA modulus factorization and prime field discrete loga- 
rithm computation. Some of the most expensive attacks implemented in cryptography 
are in fact concerned with establishing records of factorization and discrete logarithm 
computations. We believe that it is instructive to compare the resources necessary in both 
cases. As an example, we consider the 2009 factorization of a 768-bit RSA modulus from 
Kleinjung et al. [19] and the recent 2016 discrete logarithm computation in a 768-bit prime 
field from Kleinjung et al. [20]. 

The 2009 factorization required about 2000 core years on a 2.2GHz AMD Opteron of 
the time. The number of single instructions to have been executed is estimated to be of 
the order of 267 [19]. ê 

The 2016 discrete logarithm computation was a bit more than three times more 
expensive, and required about 5300 core years on a single core of a 2.2GHz Xeon E5- 
2660 [20]. 

In both cases, the overall computational effort could have been decreased by reducing 
the time that was spent collecting relations [19, 20]. However, this would have made the 
following linear-algebra step harder to manage and a longer computation in calendar time. 
Kleinjung et al. estimated that a shorter sieving step could have resulted in a discrete 
logarithm computation in less than 4000 core hours [20]. 

To compare the cost of the attacks, we can estimate how many SHA-1 (compression 
function) calls can be performed in the 5300 core years of the more expensive discrete 
logarithm record [20]. Considering again a 2.3GHz Xeon E5-2650 (slightly faster than the 
CPU used as a unit by Kleinjung et al.) running at about 2°33 SHA-1 calls per second, 
the overall effort of [20] is equivalent to approximately 2°°° SHA-1 calls. It is reasonable 
to expect that even on an older processor the performance of running SHA-1 would not 
decrease significantly; taking the same base figure per core would mean that the effort 
of [19] is equivalent to approximately 2589 ~ 259-2 SHA-1 calls. 

In absolute value, this is less than the effort of our own attack, the more expensive 
discrete logarithm computation being about five times cheaper’, and less than twice more 
expensive than computing a single first-block near collision. However, the use of GPUs for 
the computation of the second block of our attack allowed both to significantly decrease 
the calendar time necessary to perform the computation, and its efficiency in terms of 
necessary power: as an example, the peak power consumption of a K40 is only 2.5 times the 
one of a 10-core Xeon E5-2650, yet it is about 25 times faster at computing the compression 
function of SHA-1 than the whole CPU, and thence 10 times more energy efficient overall. 
The energy required to compute a collision using GPUs is thus about twice less than 
the one required for the discrete logarithm computation®. As a conclusion, computing a 
collision for SHA-1 seems to need slightly more effort than 768-bit RSA factorization or 
prime-field discrete logarithm computation but, if done on GPUs, the amount of resources 
necessary to do so is slightly less. 


ê Note that the comparison between factorization and discrete logarithm computation mentioned in [20] 
gives for the former a slightly lower figure of about 1700 core hours. 

But now is also a good time to recall that directly comparing CPU and GPU cost is tricky. 

8 This is assuming that the total energy requirements scale linearly with the consumption of the processing 
units. 
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A The attack parameters 


The first block of the attack uses the same path and conditions as the one given in [36, 
Section 5], which we refer to for a description. This section gives the differential path, 
linear (message) bit relations and neutral bits used in our second near-collision attack. 

We use the notation of Table 3 to represent signed differences of the differential path 
and to indicate the position of neutral bits. 

We give the differential path of the second block up to Aə3 in Figure 3. We also 
give necessary conditions for A22 to Ass in Table 4, which are required for all alternate 
differential paths allowed. In order to maximize the probability, some additional conditions 
are also imposed on the message. These message bit relations are given in Table 5, and 
graphically in Figure 7. The rest of the path can then be determined from the disturbance 
vector. 

We also give the list of the neutral bits used in the attack. There are 51 of them over 
the seven message words mıı to m5, distributed as follows (visualized in Figure 4): 


— mıı: bit positions (starting with the least significant bit at zero) 7, 8, 9, 10, 11, 12, 13, 
14, 15 
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i A; Mi 

-4 AAyyAAAyyyAyAyyAyAAyAyyAAAAyAAyA 

-3 yAAAAyAAyyyAAAAAAyAyAAyyAAyAAyyA 

-2 AņyAyAAAAvyAvyyyVyVAyAAyyAyyyVAyAyAAA 

sl AAAAAAAAAAAyAAyAyAyAyyAyAAAyyyAy 

0 AyyyAAyAyAAyyAyyAAyvAvyyyVÁvyVVA vvA: AAyyyA: + yAAAyy: +» AAAAAAYAYAA 
1 AyAyyyAyAAy: VAALA.: - VVAVAWIVAWY wwvAv: : VVVVAVV: : AyyAAvyyVA: :4Ay: A 
2 vVyvAvyAAvyvyAvvvyAyVyVAAAyAAAvyA: Avv: 7 wAwa: AyAyyAyAAAyAAAAv: Avy: AvyYvV 
3 AyAyyy4AyAyyyyyAAAAyyAAAAAyyAyyy + yAyAyyyyyAyAyAAAAy: +++ AyyA++wvA 
4 VV WA AWI VV VV VV: VÓVAAv: AAy @AMAMAAVAVATT Verse AWwAsAA 
5 + AyyyAAyyyAAyyAyyyAyy: AA- WAY tbe  VAvAAye rere Rie reichi teria AAA 
6 VWVV' y: AAvAAAAAAAAYYVA: + VVAA. y: A e AF Cae TETT TTE vase 

7 Be 2 2 Se Ve AVIA A OW WW eee teens vvv 
8 i éea AMG E Mase FE ROE EE E Oe: OE Y-A N EREEREER EEEEEEEEEEEEET v 
9 sai i a a taitele leas eecesane Ay OY Oh ee a ee ae vea 
10 KAA A yeee CEF VVA Se waa 
11 AZ Ve Agie a iere leita ie ielem aw ube v 
| eC ee Pn TE 7 VY Ce ee ay 

e ee OC e E E EET TET Cee ae ee ee eer EE veeee 
14 Fane rrr a S ETETETT ETES Pe EEEE EA S vá. 
15 ce tre © MMA i 6lisrde gece eve teeta: ene ee wee Per 
Gn 7 © 2 * E EE EEEE EE ET av 

17 ete Debs ene ig a ein ete ieee Sg Anal * rY ETTER ET ETE ERETE ETETETT r TERE 
18 Ve A vesa AE a aE E E E A.A.. 
19 spor so ELTELTE EEEE ETTTEEEEELTEET iS De a ae tea are TEETE TEER keee 
20 PIE RCO re KE ea er Me er OE KERE 0. 4, Se AAy 
21 Ce Sarr TEETE TE ERTE EE T TS Le Ce 
29 A a a e EFV Ce REEERE EERE EEEEEEEEE 4.. 
23 Sete EEEE EEEE TEEETTTEEEETTTTET * 


, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 
10, 11, 12, 13, 15, 16, 30 

„10 

, 10, 12 


— m2: positions 2, 5, 6 
— mg: positions 5, 6, 7 
— ma: positions 4, 6, 7, 
— m5: positions 5, 6, 7 


Not all of the neutral bits of the same word (say mj 3) are neutral up to the same point. 
Their repartition in that respect is as follows, a graphical representation being also given 
in Figure 5. 


— Bits neutral up to Aja (included): m1(9,10,11,12,13,14,15], 


m42(2,14,15,16,17,18,19,20], ™m13[12,16] 


Table 3: Meaning of the bit difference symbols, for a symbol located on A;[i]. The same 
symbols are also used for m. 


Symbol Condition on (A,A) Symbol Condition on (A,A) 

Aili] = Ali] « Ali] = Ali] = Ali] 
e Ali] + Zli] Aili] = Ali] + Aali] 
“ A,fi]=0,  A,[i]=1 > Adi] = Ali] = (ALDI 
x Ali]=1, A[i]=0 ¢ = Aili] = Zli] + (42A 
v Ali] = Ai[i] = 0 o Ali] = Afi] = (AL. [A] 
a Ali = Ai{i}=1 _ «= A,[é] = A + (AC?) [A] 
* No condition on A;[7], Afi] 
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A bit neutral to A; is then used to produce partial solutions at A;,;. One should also note 
that this list only includes a single bit per neutral bit group, and some additional flips may 


Table 4: Additional necessary conditions used for Ag2 to Agg. 


A22[27] S>] m23[27] = A»21[29] el 

A24[27] © m25[27] = A23[29] 

A25[28] ® m25[27] = A23[30] © 1 

Aə26[27] @ mMg7[27] = Aas [29] 
A24[30] = m23[30] : A25[29] © m23[27] = A24[31] 
A24[30] + m23[30] : A24[31] = m23[30] 


m23[27] @ m23[28] 21 m23[30] (S>) mə24[3] =í m23[30] D m2s[28] =1 
m23[4] = 0 m24[28] = 0 m24[29] = 0 
meal 2] =0 m26[28] @ m26[29] =i m27[29] =0 
m2s[27] =0 mes[4] (<>) m32[29] =0 m36[4] @ maa[28] =1 
m3s[4] ® ma4[28] = 0 m39[30] ® maa[28] = 1 mao[3] ® ma4[28] = 0 
mao[4] @ maa[28] al MAL [29] ® mai [30] =0 maz[2 8] [<>] maa[28] =0 
ma3[28] ® ma4[28] = 0 ma3[29] ® ma4[28] = 1 ma3[4] ® ma7[29] = 0 
maa[28] ® ma4[29] = 1 mas[29] ® ma7[29] = 0 ma6[29] ® ma7[29] = 0 
mas[4] ® ms52 [29] =0 mso[29] ($>) ms2[29] =0 msi [2 9] ® ms52 [29] =0 
ms4[4] (<>) meo[29] =1 ms6[29] (<>) meo[29] =] ms6[4] @ meo[29] =0 
ms57[29] ® meo[29] = 1 msg9[29] ® m6o[29] = 0 me7[0] ® m72[30] = 1 
mes[5] ® m72 [30] =0 mzo[1] ®mr71 [6] =1 mr71 [0] [S>] m76[30] =i 
m72[5] ® m76[30] = 0 m73[2] ® mzs[0] = 1 mza[1] ® mzs[6] = 1 
mra[7] ($>) mzs[0] =0 mzs[1] ® m76[6] =l mz6[0] (S>) mze[1] =i 
m76[3] =1 mz77[0] ®@ m77[1] =0 m77[0] @ m77[2] =I 
m77[8] =0 M78 [3] SL mz7s[7] =0 

mro[2] =0 m79 [4] =i 

OENE TEREREEEEEEET ee -cecccccce:---- 


Fig. 4: The 51 single neutral bits used in the second block attack. 


Bits neutral up to Ais ( 
Bits neutral up to Aie ( 
Bits neutral up to Aj7 ( 
Bits neutral up to Ajg ( 
Bits neutral up to Ajg ( 


included): 
included): 


) 
) 
): 
) 
) 


included 


included): 
included): 


mil7, 8], mı2[9,10,11,12,13], mı3[15,30] 
m2[5,6,7], mı3[10,11,13] 
m3[5, 6, 7, 8 9], mı4[10] 
my4|6, 7 9], m5(10,12] 
[4, 


M14 8], m15[5,6,7,8,9] 


be necessary to preserve message bit relations. 


Out of the three boomerangs used in the attack, one first introduced a perturbation 
on mog on bit 7, and the other two on mpg, on bit 6 and on bit 8. All three boomerangs 
then introduce corrections to ensure a local collision. Because these local collisions happen 
in the first round, where the Boolean function is a bitwise JF’, only two corrections are 


necessary for each of them. 


The lone boomerang introduced on mpg is neutral up to A22, and the couple introduced 
on mg are neutral up to Ags. The complete sets of message bits defining all of them are 


shown in Figure 6, using a “difference notation”. 
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Fig. 5: The 51 single neutral bits regrouped by up to where they are neutral. 


Fig. 6: Boomerang local collision patterns using symbols. First perturbation difference is 
highlighted with a black symbol. Associated correcting differences are identified with the 
corresponding white symbol. 
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B Auxiliary material 


Fig. 7: Graphical representation of the linear equations of Table 5. A “.” means no equation, 
a “0” or “1” means equal to 0 or 1, respectively. A pair of two identical letters x means 
that the two bits have the same value. A pair of two letters x and X means that the two 
bits have different values. 


B.1 An example of two colliding PDF files 


We give in Figure 8 the base64-encoded data of two compressed PDF files with same SHA-1 
hash and distinct visual content. To check this, simply copy the string into a text file 
“coll.tar.bz2.b64” and type the following commands in a terminal: 


$ base64 --decode coll.tar.bz2.b64 > coll.tar.bz2 
$ tar -xvf coll.tar.bz2 


22 


QlpoOTFBWSZTWbLSV5MABL///////9Pv///v////+/////8DAK739/677r+W3/75rUNr4Aa/AAAAAAA 
CgEVTRtQDQAaAOAAyGmj TQGmgAAANGgAaMIAYgGgAABoOAAAAAADQATAGQOMgDIGmjQAODRKOAaMQOD 
QAGIANGgAAGRoNGQMRpo0GIGgBoGQAATAGQOMgDIGmj QAODRKOAaMQODQAGIANGgAAGRoNGQMRpo0GI 
GgBoGQAATAGQOMgD1Gmj QAODRkOAaMQODQAGIANGgAAGRoNGQMRpo0GIGgBoGQAATAGQOMgDIGmj QAO 
DRkOAaMQODQAGIANGgAAGRONGQMRpo0GIGgBoGQAABVTUEXEZATTICnkxNR+p6E09 JppoyamjGhkm0a 
mnIyaekbUejU9JiGnqZqaaDx J6m0 JkZMQ20aYmJ6gxqMyE2TUzJqf Itligt JQJ£Yb19Zy9Qj QuB5mHQ 
RdSSXCCTHMgmSDYmd0o0mLTBJWiCpOhMQYpQ1OYpJ jn+wQUJSTCEpOMekaFaaNB6g1CCOhKEJdHr6Bm 
UIHeph7YxS8WJYyGwgWnMTF JBDFSxSCCY1j iEk7HZgJzJVDHJxMgY6tCEIIWgskS1SZ0S8GckoIIF+5 
51Ro4RCw260VCEpWJS1pWx/PMrLyVoyhWMAneDi1BcUIeZ1j6NCkusOqUCWnahhk5KT4GpWMh3vm2nJ 
WjTL9Qg+84iExBJhNKpbV9tvEN265t3fu/TKkt4rXFTsV+NcupJXhOhOhIMQQktrqt4K8mSh9M2DA02 
X7uXGVLOYQxUtzQmS7uBndL7M6R7vX869VxqPurenSuHYNq1lyTXOfNWLwgvK1R1FYqLCs60ChDpOHuT 
zCWscmGudLyqUuwVGG75nmyZhKpJy0E/pOZyHyrZxGM51DYIN+Jc8yVJgAykxKCEtW55M1fudLg3KkG6 
TtozalunXrroSxUpVLStWrWLF ihMnVpkyZOrQnUrE6xq1iCGt J1bAbSShMbV1CZgq1l KCOwCFCpMmUKSE 
kvFLaZC8wHOCVA1vzaJQ/T+XLb5Dh5TNM67p6KZ4e4ZSGyVENx2027LzrTIt eAreTkMZpW95GSOCEJY 
hMc4nToTJOwQhKEyddaLb/rTqmgJS1kpnALxMh1NmuKEpkEkqhKUoEq3SokUpIQcDgW1COrYahMmLuP 
QOfHqZaF4v2W8IoJ2EhMhYmSw7 qql127WJS+G4rUp1ToFi2rSvONSrVvDUp1tQ8Lv6F8pXyxmFBSxiLs 
xg1lNC4uvXVKmAtusxXy4YXGX1ixedEvxF1ax6t8adYnYCpC6rW1ZzdZY1CCxKEv8vpbqdSsx18v1jCQv 
OKEPxPTa/5rtWSF1dSgg4z4k jf IMNt gwWoWLEsRhKxsSA9ji7V5LRPwtumeQ8V57UtFSP1UmtQd0Qfs 
eI2Ly1DMtk4J18n927w34zrWG6Pi4jzC82js/46Rt21ZoadWx0tMInS2xYmcu8m0w9PLYxQ4bdfFw3Z 
Pf /g2pzSwZDhGrZA191qky0W+yeanadC037xk496t0Dq3ctfmqmj gie81n9k6QOK1krb3dk9el4Xsu4 
4LpGcenr2eQZ1s1ThOhnE56WnxXf OBLWn9Xz15f£Mkzi4kpVxiTKGEpf fErEEMvVEeMZhU16yD1SdeJYbx 
zZGNM3ak2TAag1LZ1DCVnoM6wV5DRrycwF8Zh/fRsdmhkMf AO 1duwknrsFwrzePWeMw1107DWzymxdQw 
iSXx/1ncnn75jL9mUzw2bUDqj 20LTgtawxK2S1Qg1CCZDQMgSpEqL jRMsykM9zbSIUqil0ZNk7Nu+b5 
JODKZ1h19CtpGKgX5uyp0idoJ3we9bSrY7PupnULSeWiDpVoSmmnNUhOnYi8xyC1lkLbNmAXyoWk7GaVr 
M2umkbpqHDzDymikjetgzTocWNsJ2E0zPcfht46J4ipaxGCfF7fu00a70c82bvqo3HceIcR1shgu73s 
e08BqlLIap2z5jTOY+T2ucCnBt At va3aHdchJg9AJ5YdKHz7LoA3VKmeqxAlFyEnQLBxB2PAhAZ8Kvm 
uR6ELXws1Qr13Nd1i4nsp189jqvaNzt+OnEnIaniuP1+/U0ZdyfoZh57ku8sYHKdvfW/ j YSUks+OrK+ 
qtte+py8jWL9cOJOfV8rrH/t+85/p1z2N67p/ZsZ3 Jmdy1iL71rNxZU1x0MVI16PxX0UuGOeArW3vuE 
vJ2beoh7SGyZKHKbR2bBW01d49 JDIcVM61Qtu9U08ec8p0nXmkcponBPLNM2CwZ9kNC/4ct6rQkPkQH 
McV/8XckU4UJCy+VeTA== 


Fig. 8: Two colliding PDF files. 
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